OcrV1, Main, Exploration, bibRecord, 001088

Digitizing a Million Books: Challenges for Document Analysis

Identifieur interne : 001088 ( Main/Exploration ); précédent : 001087; suivant : 001089

Digitizing a Million Books: Challenges for Document Analysis

Auteurs : Pramod Sankar [Inde] ; Vamshi Ambati [États-Unis] ; Lakshmi Pratha [Inde] ; V. Jawahar [Inde]

Source :

Lecture Notes in Computer Science [ 0302-9743 ] ; 2006.

RBID : ISTEX:E96E767CE48405122392E7508C98969E20DA18DE

Abstract

Abstract: This paper describes the challenges for document image analysis community for building large digital libraries with diverse document categories. The challenges are identified from the experience of the on-going activities toward digitizing and archiving one million books. Smooth workflow has been established for archiving large quantity of books, with the help of efficient image processing algorithms. However, much more research is needed to address the challenges arising out of the diversity of the content in digital libraries.

Url:

https://api.istex.fr/document/E96E767CE48405122392E7508C98969E20DA18DE/fulltext/pdf

DOI: 10.1007/11669487_38

Affiliations:

Links toward previous steps (curation, corpus...)

to stream Istex, to step Corpus: 000213
to stream Istex, to step Curation: 000210
to stream Istex, to step Checkpoint: 000A61
to stream Main, to step Merge: 001105
to stream Main, to step Curation: 001088

Le document en format XML

<record><TEI wicri:istexFullTextTei="biblStruct"><teiHeader><fileDesc><titleStmt><title xml:lang="en">Digitizing a Million Books: Challenges for Document Analysis</title>
<author><name sortKey="Sankar, Pramod" sort="Sankar, Pramod" uniqKey="Sankar P" first="Pramod" last="Sankar">Pramod Sankar</name>
</author>
<author><name sortKey="Ambati, Vamshi" sort="Ambati, Vamshi" uniqKey="Ambati V" first="Vamshi" last="Ambati">Vamshi Ambati</name>
</author>
<author><name sortKey="Pratha, Lakshmi" sort="Pratha, Lakshmi" uniqKey="Pratha L" first="Lakshmi" last="Pratha">Lakshmi Pratha</name>
</author>
<author><name sortKey="Jawahar, V" sort="Jawahar, V" uniqKey="Jawahar V" first="V." last="Jawahar">V. Jawahar</name>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:E96E767CE48405122392E7508C98969E20DA18DE</idno>
<date when="2006" year="2006">2006</date>
<idno type="doi">10.1007/11669487_38</idno>
<idno type="url">https://api.istex.fr/document/E96E767CE48405122392E7508C98969E20DA18DE/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000213</idno>
<idno type="wicri:Area/Istex/Curation">000210</idno>
<idno type="wicri:Area/Istex/Checkpoint">000A61</idno>
<idno type="wicri:doubleKey">0302-9743:2006:Sankar P:digitizing:a:million</idno>
<idno type="wicri:Area/Main/Merge">001105</idno>
<idno type="wicri:Area/Main/Curation">001088</idno>
<idno type="wicri:Area/Main/Exploration">001088</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title level="a" type="main" xml:lang="en">Digitizing a Million Books: Challenges for Document Analysis</title>
<author><name sortKey="Sankar, Pramod" sort="Sankar, Pramod" uniqKey="Sankar P" first="Pramod" last="Sankar">Pramod Sankar</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>Regional Mega Scanning Centre, International Institute of Information Technology, Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Ambati, Vamshi" sort="Ambati, Vamshi" uniqKey="Ambati V" first="Vamshi" last="Ambati">Vamshi Ambati</name>
<affiliation wicri:level="4"><country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Institute for Software Research International, Carnegie Mellon University</wicri:regionArea>
<placeName><settlement type="city">Pittsburgh</settlement>
<region type="state">Pennsylvanie</region>
</placeName>
<orgName type="university">Université Carnegie-Mellon</orgName>
</affiliation>
</author>
<author><name sortKey="Pratha, Lakshmi" sort="Pratha, Lakshmi" uniqKey="Pratha L" first="Lakshmi" last="Pratha">Lakshmi Pratha</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>Regional Mega Scanning Centre, International Institute of Information Technology, Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
</author>
<author><name sortKey="Jawahar, V" sort="Jawahar, V" uniqKey="Jawahar V" first="V." last="Jawahar">V. Jawahar</name>
<affiliation wicri:level="1"><country xml:lang="fr">Inde</country>
<wicri:regionArea>Regional Mega Scanning Centre, International Institute of Information Technology, Hyderabad</wicri:regionArea>
<wicri:noRegion>Hyderabad</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1"><country wicri:rule="url">Inde</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series><title level="s">Lecture Notes in Computer Science</title>
<imprint><date>2006</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">E96E767CE48405122392E7508C98969E20DA18DE</idno>
<idno type="DOI">10.1007/11669487_38</idno>
<idno type="ChapterID">38</idno>
<idno type="ChapterID">Chap38</idno>
</biblStruct>
</sourceDesc>
<seriesStmt><idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass></textClass>
<langUsage><language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">Abstract: This paper describes the challenges for document image analysis community for building large digital libraries with diverse document categories. The challenges are identified from the experience of the on-going activities toward digitizing and archiving one million books. Smooth workflow has been established for archiving large quantity of books, with the help of efficient image processing algorithms. However, much more research is needed to address the challenges arising out of the diversity of the content in digital libraries.</div>
</front>
</TEI>
<affiliations><list><country><li>Inde</li>
<li>États-Unis</li>
</country>
<region><li>Pennsylvanie</li>
</region>
<settlement><li>Pittsburgh</li>
</settlement>
<orgName><li>Université Carnegie-Mellon</li>
</orgName>
</list>
<tree><country name="Inde"><noRegion><name sortKey="Sankar, Pramod" sort="Sankar, Pramod" uniqKey="Sankar P" first="Pramod" last="Sankar">Pramod Sankar</name>
</noRegion>
<name sortKey="Jawahar, V" sort="Jawahar, V" uniqKey="Jawahar V" first="V." last="Jawahar">V. Jawahar</name>
<name sortKey="Jawahar, V" sort="Jawahar, V" uniqKey="Jawahar V" first="V." last="Jawahar">V. Jawahar</name>
<name sortKey="Pratha, Lakshmi" sort="Pratha, Lakshmi" uniqKey="Pratha L" first="Lakshmi" last="Pratha">Lakshmi Pratha</name>
</country>
<country name="États-Unis"><region name="Pennsylvanie"><name sortKey="Ambati, Vamshi" sort="Ambati, Vamshi" uniqKey="Ambati V" first="Vamshi" last="Ambati">Vamshi Ambati</name>
</region>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001088 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001088 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:E96E767CE48405122392E7508C98969E20DA18DE
   |texte=   Digitizing a Million Books: Challenges for Document Analysis
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Digitizing a Million Books: Challenges for Document Analysis

Digitizing a Million Books: Challenges for Document Analysis

Source :

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri